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In  this  paper,  we  propose  a  new  representation  of  numeric  functions 
using  a  piecewise  arithmetic  expression.  To  represent  a  numeric  func¬ 
tion  compactly,  we  partition  the  domain  of  the  function  into  uniform 
segments,  and  transform  the  sub-function  in  each  segment  into  an  arith¬ 
metic  spectrum.  From  this  arithmetic  spectrum,  we  derive  an  arithmetic 
expression,  and  obtain  a  piecewise  arithmetic  expression  for  the  func¬ 
tion.  By  using  the  piecewise  arithmetic  expression,  we  can  increase 
the  number  of  zero  arithmetic  coefficients  significantly,  and  represent 
a  numeric  function  more  compactly  than  using  a  conventional  single 
arithmetic  expression.  We  also  present  an  application  of  the  piecewise 
arithmetic  expression  to  design  of  numeric  function  generators  (NFGs). 
Since  the  piecewise  arithmetic  expression  has  many  zero  coefficients 
and  repeated  coefficients,  by  storing  only  distinct  nonzero  coefficients 
in  a  table,  we  can  significantly  reduce  the  table  size  needed  to  store 
arithmetic  coefficients.  Experimental  results  show  that  the  table  size  can 
be  reduced  to  only  a  small  percent  of  the  table  size  needed  to  store  all 
the  arithmetic  coefficients. 

Keywords:  Piecewise  arithmetic  expressions,  nonzero  arithmetic  coefficients, 
numeric  function  generators  (NFGs),  programmable  architectures. 
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1  INTRODUCTION 

Numeric  functions,  such  as  trigonometric,  logarithmic,  square  root,  and  com¬ 
binations  of  these  functions,  have  a  wide  range  of  applications  including  com¬ 
puter  graphics,  digital  signal  processing,  communication  systems,  robotics, 
etc.  [5].  These  applications  usually  use  numeric  functions  as  a  basic  opera¬ 
tion,  as  well  as  addition  and  multiplication.  Particularly,  in  graphics  applica¬ 
tions  for  embedded  systems,  the  computation  of  numeric  functions  accounts 
for  about  half  of  the  total  processing  time  [13].  Thus,  for  numerically  inten¬ 
sive  or  real-time  applications,  hardware  accelerators,  called  numeric  function 
generators  (NFGs),  are  often  required.  The  computation  of  numeric  func¬ 
tions  has  been  studied  for  more  than  150  years  [22],  and  various  NFGs  have 
been  proposed  [2,4, 14, 17, 18].  Many  existing  NFGs  are  based  on  polynomial 
approximations. 

For  design  and  verification  of  arithmetic  circuits  such  as  adders  and 
multipliers,  the  arithmetic  transform  is  often  used  due  to  its  compact¬ 
ness  [1,  3,  15,  21,  23].  However,  for  the  design  of  NFGs,  it  is  rarely  used. 
Only  two  studies  of  NFGs  using  the  arithmetic  transform  are  known  [16,20]. 
However,  in  both  papers,  the  NFG  design  is  for  a  specific  numeric  function. 
That  is,  different  architectures  are  required  for  different  numeric  functions. 

Although  such  a  dedicated  NFG  for  a  specific  numeric  function  is  fast, 
many  NFGs  have  to  be  designed  for  a  wide  range  of  numeric  functions.  Since 
this  consumes  chip  area  and  accounts  for  much  of  the  design  and  production 
costs,  a  programmable  NFG,  that  can  compute  various  numeric  functions  at 
high-speed  with  a  single  architecture,  is  required,  along  with  a  systematic 
design  method.  To  satisfy  this  requirement,  this  paper  proposes  new  archi¬ 
tectures  and  a  design  method  for  programmable  NFGs  using  the  arithmetic 
transform.  In  [16,20],  the  arithmetic  transform  is  applied  to  the  whole  of  a 
numeric  function.  However,  this  is  unsuitable  for  the  design  of  programmable 
NFGs  because  it  requires  too  many  additions.  To  design  an  efficient  NFG, 
we  partition  the  domain  of  a  given  numeric  function  into  segments  of  equal 
widths,  and  apply  the  arithmetic  transform  to  the  sub-function  for  each  seg¬ 
ment.  From  the  arithmetic  spectrum  obtained  by  the  transform,  we  derive  an 
arithmetic  expression,  and  realize  the  arithmetic  expression  with  memory  and 
an  accumulator.  By  changing  the  memory  data,  we  can  realize  a  wide  range 
of  numeric  functions  with  a  single  architecture. 

This  paper  is  organized  as  follows:  Section  2  introduces  a  numeric  repre¬ 
sentation  of  a  real  numeric  function,  and  the  arithmetic  transform.  Section  3 
introduces  conventional  arithmetic  expressions  for  numeric  functions.  Sec¬ 
tion  4  presents  piecewise  arithmetic  expressions,  and  architectures  for  NFGs 
based  on  them.  Experimental  results  are  shown  in  Section  5.  And,  Section  6 
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presents  techniques  to  reduce  memory  size  and  to  improve  the  performance 
ofNFGs. 


2  PRELIMINARIES 
2.1  Number  Representation 

This  subsection  defines  functions  used  in  this  paper  and  a  number  representa¬ 
tion,  and  describes  how  to  convert  real  functions  into  other  types  of  functions. 

Definition  I.  Let  B  =  {0,  1},  Z  foe  the  set  of  the  integers,  and  K  foe  the  set  of 
the  real  numbers.  An  n -input  m -output  logic  function  is  a  mapping:  B"  — > 
B”,  a  (binary-input)  integer-valued  function  is  a  mapping:  B'^  ^  Z,  a  real 
function  is  a  mapping:  M  -»  M,  and  a  (binary-input)  real-valued  function 
is  a  mapping:  B"  — >  K. 

Definition  2.  A  numeric  function  is  a  real  function  built  from  a  combination 
of  constants,  a  variable,  four  arithmetic  operations  (-¥,  -,  *  /),  power  func¬ 
tions,  exponential  functions,  logarithmic  functions,  trigonometric  functions, 
and  inverse  trigonometric  functions. 

To  represent  real  values,  this  paper  uses  the  following: 

Definition  3.  A  value  X  represented  by  the  binary  fixed-point  representa¬ 
tion  is  denoted  by 

X  =  (Xh-\  Xh-2  ...  Xi  Xo-  X-l  X-2  .  .  ■  X-m)2, 

where  X;  €  {0,  l}/or  h  —  I  >  i  >  —m,  h  is  the  number  of  bits  for  the  integer 
part,  and  m  is  the  number  of  bits  for  the  fractional  part  ofX.  We  call 


h-l 


x= 


i=—m 


an  n-bit  fixed-point  representation  in  which  n  bits  are  used  to  represent  the 
value,  where  n  —  h  -\-  m.  In  this  paper,  an  n-bit  function  f(X)  means  that 
the  input  variable  X  has  n  bits. 

We  can  convert  a  real  function  in  fixed-point  representation  to  an  n -input 
m -output  logic  function.  The  logic  function,  in  turn,  can  be  converted  into 
an  integer-valued  function  by  considering  binary  vectors  as  integers.  That  is. 
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(a)  Table  for  sin(A). 

(b)  Truth  table  for  ft(X). 

(c)  Table  for  fi(X). 

(d)  Table  for  MX). 

X 

sinlJr) 

X  =  (O.X2XiXQ)2 

MX) 

X2XIX0 

fi(X) 

a:2Xia:o 

MX) 

0.000 

0.000 

0.000 

0.000 

000 

0 

000 

0.000 

0.125 

0.125 

0.001 

0.001 

001 

1 

001 

0.125 

0.250 

0.247 

0.010 

0.010 

010 

2 

010 

0.247 

0.375 

0.366 

0.011 

0.011 

oil 

3 

oil 

0.366 

0.500 

0.479 

0.100 

0.100 

100 

4 

100 

0.479 

0.625 

0.585 

0.101 

0.101 

101 

5 

101 

0.585 

0.750 

0.682 

0.110 

0.101 

no 

5 

110 

0.682 

0.875 

0.768 

0.111 

0.110 

111 

6 

111 

0.768 

TABLE  1 

Function  table  for  3-bit  sin(A). 


we  can  convert  a  real  function  into  an  integer- valued  function;  B"  P^, 
where  =  {0,  1, . . . ,  2'”  —  1},  by  using  a  fixed-point  representation.  And, 
similarly,  by  using  a  fixed-point  representation,  we  can  obtain  a  real-valued 
function:  B"  R. 

For  simplicity,  the  fixed-point  representation  of  X  is  denoted  hy  X  — 

(^Xn—\Xn—2  ■  •  ■  A;o)2- 

Example  1.  Table  1  (a)  shows  values  o/sin(A)/or  eight  values  ofX.  Using 
a  3-bit  fixed-point  representation,  this  function  is  converted  into  the  logic 
function  fb(X)  in  Table  1  (b).  By  representing  the  output  vectors  as  integers, 
we  have  the  integer-valued  function  ffX)  in  Table  1  (c).  And,  Table  1(d) 
shows  a  real-valued  function.  m 

2.2  Rounding  Error  and  Numeric  Function  Generator 

Definition  4.  Error  is  the  absolute  difference  between  the  exact  value  and  an 
approximated  value.  Rounding  error  is  an  error  that  is  caused  when  a  real 
value  is  represented  by  the  fixed-point  representation  with  a  finite  number  of 
fractional  bits.  It  is  no  greater  than  2“*'”+^  when  m  fractional  bits  are  used 
to  represent  a  real  value. 

Definition  5.  A  numeric  function  generator  or  NFG  is  a  logic  circuit  that 
computes  approximated  values  for  a  numeric  function  in  the  fixed-point  rep¬ 
resentation. 

NFGs  are  usually  designed  so  as  to  achieve  2“*^'"+'^  or  2“'"  worst  case 
error,  where  m  is  the  number  of  fractional  bits  in  the  output  of  NFGs. 
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2.3  Arithmetic  Transform 

This  subsection  introduces  the  arithmetic  transform,  the  arithmetic  spectrum, 
and  the  arithmetic  expression  [19]. 

First,  define  a  matrix  operation. 

Definition  6.  Let  Abe  an  (n  x  n)  square  matrix,  where 


flu 

fll2  . 

fl2I 

fl22  • 

■  ■  ^2n 

fl/i2 

•  •  ^nn 

Let  B  be  an  (n  X  n)  square  matrix.  Then,  the  Kronecker  product  of  A  and 
B  is  the  (n^  X  n^)  matrix: 


aiiB 

anB 

. .  ai«fi 

a2\B 

fl22.fi 

. .  fl2«fi 

Gnl  ^ 

fl„2fi  ■ 

.  .  OnnB 

Definition  7.  Given  a  matrix  M,  the  transposed  matrix  M'  is  obtained  by 
interchanging  rows  and  columns  of  M.  For  a  binary-input  function  f(X) 
(i.e.  integer-valued  or  real-valued  function),  the  function-vector  F  is  the 
column  vector  of  the  function  values  F  —  [/(OO  ...  0),  /(OO  . . .  01),  . . . , 
/(ll...l)]^ 

We  define  the  arithmetic  transform  and  the  arithmetic  spectrum  as  follows: 

Definition  8.  The  arithmetic  transform  matrix  is 

n 

-4(n)  =  (^  .4(1),  where  .4(1)  = 

i=I 

such  that  addition  and  multiplication  are  done  in  integer  arithmetic.  For  a 
binary-input  function  f  given  by  the  function-vector  F,  the  arithmetic  spec¬ 
trum  Af  —  [ao,  ai,  . . . ,  fl2"-i]*  is 


1  0 

-1  1 


Af  —  A(n)F. 

Each  a,  in  the  spectrum  is  called  an  arithmetic  coefficient. 
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Example  2.  Consider  the  l-bit  adder  function  f{x\,X2)  —  x\+X2-  The 
function-vector  is  F—  [0,  1,  1, 2]'.  The  arithmetic  spectrum  is 


Af  =  A(2)F  = 


0 

0  - 

■  0  ■ 

■  0  " 

0 

0 

1 

1 

1 

0 

1 

1 

1 

1  _ 

_  2  _ 

_  0  _ 

Similarly,  we  define  the  inverse  arithmetic  transform  as  follows: 

Definition  9.  Let  A~^(n)  be  the  inverse  arithmetic  transform  matrix 

defined  by 


i=l  L 

Definition  10.  In  a  symbolic  representation, 

.4-'(l)  =  [  1  X,-  ] . 

Therefore,  the  inverse  arithmetic  transform  is  defined  as 


f^XaAf,  x„  =  (g)[  1  X,]. 

i=l 

Example  3.  By  the  inverse  arithmetic  transform  from  the  arithmetic  spec¬ 
trum  obtained  in  Example  2,  the  function  f  is  represented  as  follows: 


f 


XaAf  =  [  1  X2  Xl  XlX2  ] 


0 

1 

1 

0 


Xl  +  X2. 


From  Definitions  9  and  10,  we  can  see  that  a  binary-input  function  f(X) 
can  be  represented  by  the  arithmetic  spectrum  and  the  inverse  arithmetic 
transform.  That  is. 
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Lemma  1.  Using  A  *(1)  and  A(l),  a  binary-input  function  f  is  represented 
as  follows: 


f 


y4-‘(lM(l)F  =  [  1 


Xi 


1  0  ■ 

fo 

-1  1  _ 

.  /i  . 

[  1  Xi  ] 


/o 

/l-/o 


/o  +  Xiifi  -  fo), 


(1) 


where  fo  =  /(x,-  =  0),  /i  =  /(x;  =  1).  (1)  is  the  arithmetic  transform 
expansion  (also  called  A-expansion  or  moment  decomposition  [1]).  By 
applying  the  arithmetic  expansion  to  f  recursively,  we  obtain  the  arithmetic 
expression  for  f: 


flo  +  aiXo  +  a2Xi  +  03X1X0  +  .  .  .  +  fl2"-I-rn-lX„_2  ■  •  ■  Xq, 

where  coefficients  O;  of  the  arithmetic  expression  for  f  are  the  arithmetic 
coefficients. 

Example  4.  The  arithmetic  spectrum  of  the  function  f  in  Table  1(c)  is 
[0,  1, 2,  0,  4,  0,  —1,  0]^  and  thus,  the  arithmetic  expression  for  f  is 

xo  +  2x1  +4x2  -X2X1. 

The  arithmetic  spectrum  of  the  function  fy  in  Table  1(d)  is  [0.000,0.125, 
0.247,  —0.006,  0.479,  —0.019,  —0.044,  0.026]^  and  the  arithmetic  expres¬ 
sion  for  fr  is 

0.125xo  +  0.247x1  —  0.006xiXo  +  0.479x2  —  0.019x2Xo  —  0.044x2Xi 

+  0.026x2X1X0. 


3  ARITHMETIC  EXPRESSIONS  EOR  NUMERIC  FUNCTIONS 

There  are  two  ways  to  obtain  arithmetic  expressions  from  given  numeric 
functions.  One  way  is  to  obtain  them  by  applying  the  arithmetic  trans¬ 
form  directly  to  real-valued  functions  of  given  functions.  The  arithmetic 
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expressions  produced  by  this  way  have  real-valued  arithmetic  coefficients. 
Thus,  in  this  paper,  we  call  them  real-valued  arithmetic  expressions. 

Another  way  is  by  applying  the  arithmetic  transform  to  integer-valued 
functions.  Since  this  produces  the  arithmetic  expressions  with  integer  coeffi¬ 
cients,  we  call  them  integer-valued  arithmetic  expressions. 

3.1  Numeric  Function  Generators  Based  on  Arithmetic  Expressions 

Integer- valued  arithmetic  expressions  can  be  directly  realized  without  any 
rounding  error  using  only  AND  gates  and  adders  [16,20].  Since  rounding 
error  is  caused  only  when  a  given  numeric  function  is  converted  into  an 
integer-valued  function,  the  error  of  an  NFG  based  on  the  integer-valued 
arithmetic  expression  is  at  most  where  m  is  the  number  of  fractional 

bits  in  the  output  of  the  NFG. 

Similarly,  real-valued  arithmetic  expressions  can  be  realized  using  only 
AND  gates  and  adders.  But,  to  realize  real-valued  arithmetic  expressions, 
all  real-valued  arithmetic  coefficients  have  to  be  rounded  to  a  finite  number 
of  fractional  bits.  Let  I  be  this  number,  and  let  n  be  the  number  of  input 
bits.  Since  the  number  of  arithmetic  coefficients  is  2",  the  worst  case  error 
to  realize  a  real-valued  arithmetic  expression  is  2"  x  In 

addition,  a  resulting  value  of  the  expression  is  also  rounded  to  m  fractional 
bits  in  the  output  of  an  NFG  based  on  the  real-valued  arithmetic  expression. 
Since  this  rounding  can  cause  at  most  2“i'"+b  error,  the  worst  case  error  of  the 
NFG  is  2“*'”+^^  +  2“i'“"+F.  Thus,  the  NFG  cannot  achieve  a  2“*'"+'^  worst 
case  error.  To  achieve  a  2“™  worst  case  error,  I  —  m  +  n  bits  are  required 
for  each  arithmetic  coefficient.  That  is,  real-valued  arithmetic  expressions 
require  more  bits  to  represent  each  arithmetic  coefficient  but  achieve  a  larger 
worst  case  error  than  integer-valued  arithmetic  expressions. 

However,  in  either  arithmetic  expression,  the  size  of  the  NFGs  is  strongly 
dependent  on  the  number  of  zero  arithmetic  coefficients.  When  many  arith¬ 
metic  coefficients  are  zero,  an  arithmetic  expression  can  be  realized  with  a 
compact  circuit  because  the  product  terms  with  zero  coefficients  in  the  arith¬ 
metic  expression  can  be  eliminated.  Thus,  in  the  next  subsection,  we  consider 
the  number  of  zero  arithmetic  coefficients. 

3.2  Number  of  Zero  Arithmetic  Coefficients 

The  number  of  zero  arithmetic  coefficients  in  an  arithmetic  expression 
depends  on  the  nonlinearity  of  the  original  numeric  function.  Since  dif¬ 
ferentiable  numeric  functions  can  be  expanded  into  polynomial  functions 
using  Maclaurin  expansion,  Taylor  expansion,  Chebyshev  expansion,  and  so 
on,  we  consider  polynomial  functions  to  investigate  a  correlation  between 
the  number  of  zero  arithmetic  coefficients  and  the  nonlinearity  of  numeric 
functions. 
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For  polynomial  functions,  the  following  lemma  holds: 


Lemma  2.  [8]  For  an  n-bit  kth-degree  polynomial  function  f(X)  =  CkX^  + 
cic-iX^~^  +  . . .  +  Co,  the  number  of  nonzero  arithmetic  coefficients  is  at  most 


From  this  lemma,  we  can  see  that  an  upper  bound  on  the  number  of 
nonzero  arithmetic  coefficients  increases  with  k  when  the  number  of  bits  n 
is  fixed,  as  shown  in  Figure  1(a).  Also,  as  shown  in  Figure  1(b),  in  which 
the  solid  line  denotes  the  number  of  nonzero  arithmetic  coefficients  and  the 
dashed  line  denotes  the  total  number  of  arithmetic  coefficients  (i.e.,  2"),  the 
number  of  zero  coefficients  increases  with  n  when  k  is  fixed.  That  is,  when 
k  is  sufficiently  smaller  than  n,  the  number  of  zero  coefficients  becomes 
larger. 

Thus,  a  lower  polynomial  degree  tends  to  produce  more  zero  arithmetic 
coefficients.  In  other  words,  numeric  functions  that  are  close  to  linear,  as 
shown  in  Figure  2(a),  have  more  zero  arithmetic  coefficients.  On  the  other 
hand,  highly  nonlinear  numeric  functions  as  shown  in  Figure  2(b)  have  fewer 
zero  arithmetic  coefficients. 

Table  2  shows  the  number  of  arithmetic  coefficients  for  various  16-bit 
numeric  functions.  In  this  table,  the  column  “Integer-valued”  shows  the  num¬ 
ber  of  arithmetic  coefficients  in  an  integer-valued  arithmetic  expression,  and 
the  column  “Real-valued”  shows  the  number  of  arithmetic  coefficients  in 
a  real-valued  arithmetic  expression.  For  integer-valued  arithmetic  expres¬ 
sions,  values  of  numeric  functions  are  rounded  to  16  fractional  bits,  and  they 
can  achieve  2“'^  worst  case  error.  On  the  other  hand,  for  real-valued  arith¬ 
metic  expressions,  arithmetic  coefficients  are  rounded  to  32  fractional  bits  to 
achieve  2“'®  worst  case  error. 

Table  2  shows  that  real-valued  arithmetic  expressions  have  more  zero 
coefficients  than  integer-valued  arithmetic  expressions.  This  is  because  real¬ 
valued  arithmetic  expressions  are  obtained  by  the  arithmetic  transform  using 
values  of  original  numeric  functions,  and  thus,  the  nonlinearity  of  numeric 
functions  directly  affects  arithmetic  coefficients.  That  is,  for  real- valued 
expressions,  the  number  of  zero  arithmetic  coefficients  increases  significantly 
when  the  original  numeric  functions  are  close  to  linear  functions. 

However,  real-valued  arithmetic  expressions  require  longer  bit  length  for 
each  arithmetic  coefficient.  This  increases  the  number  of  distinct  nonzero 
arithmetic  coefficients,  since  the  probability  that  the  same  coefficients  occur 
becomes  lower. 
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(a)  16-bit  feth-degree  polynomial  functions. 


(b)  n-bit  5th-degree  polynomial  functions. 


FIGURE  1 

Number  of  nonzero  coefficients  for  polynomial  functions. 


4  PIECEWISE  ARITHMETIC  EXPRESSIONS  FOR  NUMERIC 
FUNCTIONS 

This  section  introduces  piecewise  arithmetic  expressions,  and  presents  pro¬ 
grammable  architectures  for  NFGs  based  on  them. 

4.1  Piecewise  Arithmetic  Expressions 

As  described  in  the  previous  section,  arithmetic  expressions  can  be  realized 
with  only  AND  gates  and  adders,  and  thus,  they  are  realized  with  compact 
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(a)  sin(X)  (0  <  X  <  1) 


3.5 
3 

2.5 
2 

1.5 
1 

0.5 
0 

0  0.1  0.2  0.3  0.4  0.5  0.6  0.7  0.8  0.9  1 

(b)  V-  ln(X)  (0  <  X  <  1) 

FIGURE  2 

Graphs  of  two  numeric  functions. 


circuits  when  many  arithmetic  coefficients  are  zero.  Since  many  stan¬ 
dard  numeric  functions,  such  as  trigonometric  and  square  root  functions, 
have  many  zero  arithmetic  coefficients,  we  can  design  compact  NFGs  for 
them  [16,20]. 

However,  as  shown  in  Table  2,  highly  nonlinear  numeric  functions,  such 
as  In(^)  ln(2f),  have  few  zero  coefficients.  Even  for  numeric 

functions  close  to  linear  functions,  fixed-point  representations  with  many 
bits  (i.e.,  high  precision)  necessarily  yield  arithmetic  expressions  with  many 
product  terms.  In  both  cases,  the  resulting  NFGs  are  large  and  slow.  In  addi¬ 
tion,  a  straightforward  programmable  implementation  of  the  NFGs  proposed 
in  [16,20],  as  shown  in  Figure  3,  needs  too  many  adders  (2"  —  1  adders). 
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Functions 

nx) 

Integer-valued 

Real-valued 

Zero 

Distinct 

Zero 

Distinct 

V-  ln(A:) 

0 

8,235 

0 

20,365 

-Aln(X) 

2,436 

746 

1 

13,538 

2^ 

6,663 

148 

29,938 

616 

8,985 

172 

28,118 

823 

ln(X  -f  1) 

8,800 

164 

31,287 

1,190 

log2(X  -f  1) 

6,991 

157 

31,195 

1,278 

l/(X+l) 

9,312 

186 

31,270 

1,557 

VxTT 

9,810 

138 

32,752 

824 

sin(A') 

20,012 

140 

34,718 

636 

tan(A') 

14,921 

161 

9,657 

1,317 

sin- ‘(A) 

16,112 

180 

15,293 

2,171 

tan-' (A) 

16,703 

151 

30,796 

1,098 

TABLE  2 

Number  of  arithmetic  coefficients  for  16-bit  numeric  functions. 


Zero:  the  number  of  zero  coefficents. 

Distinct:  the  number  of  distinct  nonzero  coefficients. 
Domain  of  functions  is  0  <  X  <  1 . 


To  reduce  the  number  of  product  terms  (adders),  we  transform  sub¬ 
functions  into  a  set  of  the  arithmetic  spectra,  instead  of  transforming  the 
whole  domain  of  a  function  into  the  single  arithmetic  spectrum,  and  rep¬ 
resent  the  function  using  a  set  of  the  arithmetic  expressions.  Then,  we  design 
a  programmable  NFG  using  the  set  of  the  arithmetic  expressions. 

To  produce  a  set  of  the  arithmetic  expressions,  we  partition  the  domain 
of  a  given  numeric  function  into  uniform  segments,  and  apply  the  arithmetic 


f(X) 


FIGURE  3 

A  3-bit  programmable  NFG  based  on  an  arithmetic  expression. 
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transform  to  the  sub-function  in  each  segment.  Hence,  we  call  the  set  of  arith¬ 
metic  expressions  a  piecewise  arithmetic  expression.  Note  that,  in  the  piece- 
wise  arithmetic  expression,  we  partition  the  domain  into  segments  using  the 
most  significant  bits  (MSBs)  of  X.  Thus,  the  MSBs  are  used  to  select  a  seg¬ 
ment,  and  the  remaining  lower  bits  of  X  are  used  to  compute  an  arithmetic 
expression  for  the  segment.  Other  ways  exist  to  select  a  segment,  but  this  way 
is  especially  efficient. 

Let  k  be  the  number  of  bits  to  compute  an  arithmetic  expression  for  each 
segment.  Then,  arithmetic  coefficients  in  a  real-valued  arithmetic  expression 
for  each  segment  are  rounded  to  I  =  m  +  k  fractional  bits  because  the  number 
of  arithmetic  coefficients  is  2^.  Since  k  is  smaller  than  the  number  of  input 
bits  n,  the  piecewise  real-valued  arithmetic  expression  can  achieve  2“'”  worst 
case  error  with  fewer  bits  for  coefficients,  compared  to  the  single  real-valued 
arithmetic  expression. 

4.2  Number  of  Arithmetic  Coefficients  in  Piecewise  Arithmetic 
Expressions 

Table  3  shows  the  number  of  arithmetic  coefficients  in  piecewise  arithmetic 
expressions  for  various  16-bit  numeric  functions.  In  this  table,  the  column 
“Integer-valued”  shows  the  number  of  arithmetic  coefficients  in  a  piecewise 
integer-valued  arithmetic  expression,  and  the  column  “Real-valued”  shows 


Functions 

f(X) 

Integer-valued 

Real-valued 

Zero 

Distinct 

Zero 

Distinet 

V-  ln(A:) 

25,318 

866 

39,835 

3,073 

-X\n(X) 

28,308 

527 

29,866 

2,408 

2X 

30,012 

445 

19,593 

2,256 

28,946 

587 

17,251 

2,391 

ln(X  -P  1) 

28,622 

402 

36,212 

2,164 

log2(X  -P  1) 

30,467 

451 

33,803 

2,256 

l/(X+t) 

27,834 

401 

33,331 

2,023 

n/TTT 

28,220 

323 

34,535 

1,791 

sin(X) 

34,209 

391 

27,866 

2,060 

tan(X) 

32,139 

548 

4,153 

2,297 

sin-‘(3f) 

33,477 

532 

10,356 

2,362 

tan~‘(3^) 

33,073 

401 

28,638 

2,127 

TABLE  3 

Number  of  arithmetic  coefficients  in  piecewise  arithmetic  expressions  for  16-bit  numeric  func¬ 
tions. 

Zero:  the  number  of  zero  coefficents. 

Distinct:  the  number  of  distinct  nonzero  coefficients. 

Domain  of  functions  is  0  <  X  <  1 . 

The  number  of  MSBs  for  uniform  segmentation  is  8. 
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the  number  of  arithmetic  coefficients  in  a  piecewise  real-valued  arithmetic 
expression.  For  piecewise  integer-valued  arithmetic  expressions,  values  of 
numeric  functions  are  rounded  to  16  fractional  bits.  On  the  other  hand, 
for  piecewise  real-valued  arithmetic  expressions,  arithmetic  coefficients  are 
rounded  to  24  fractional  bits  because  8  bits  are  used  to  compute  an  expression 
for  each  segment. 

From  this  table,  we  can  see  that  using  piecewise  arithmetic  expressions 
increases  the  number  of  zero  arithmetic  coefficients  significantly,  even  for 
highly  nonlinear  functions.  This  is  because  in  a  segmented  local  domain, 
numeric  functions  become  close  to  linear  functions.  However,  for  exponen¬ 
tial  and  trigonometric  functions,  the  number  of  zero  coefficients  in  piecewise 
real-valued  arithmetic  expressions  is  smaller  than  that  in  single  real-valued 
arithmetic  expressions. 

For  these  functions,  polynomial  degrees  obtained  by  a  sufficiently  accu¬ 
rate  polynomial  expansion  of  the  segmented  numeric  functions  are  not  much 
different  than  the  polynomial  degree  obtained  from  the  original  whole  func¬ 
tion.  Since  a  real-valued  arithmetic  expression  represents  even  slight  curves 
of  a  numeric  function  in  a  local  domain  faithfully,  the  piecewise  real-valued 
arithmetic  expressions  for  these  functions  equivalently  represent  polynomial 
functions  with  the  same  degrees  as  the  original  functions  but  with  fewer  bits. 
As  shown  in  Figure  1(b),  if  the  polynomial  degree  is  not  changed,  the  number 
of  zero  coefficients  tends  to  decrease  with  decreasing  number  of  input  bits. 
Therefore,  for  piecewise  real-valued  arithmetic  expressions,  the  number  of 
zero  coefficients  decreases  since  only  the  least  significant  bits  (LSBs)  of  X 
are  used  for  the  expression  in  each  segment. 

In  this  way,  increasing  the  number  of  segments  can  increase  the  number 
of  zero  coefficients,  but  too  many  segments  result  in  fewer  zero  coefficients. 
That  is,  there  is  an  optimum  number  of  segments  (i.e.,  an  optimum  number 
of  MSBs)  to  produce  the  largest  number  of  zero  coefficients,  depending  on 
the  numeric  function.  Figure  4(a)  shows  the  optimum  numbers  for  piecewise 
integer-valued  arithmetic  expressions  of  sin(A)  and  ln(A),  and  for  real¬ 
valued  expressions  of  ln(A)  and  tan(A).  For  piecewise  integer-valued 
arithmetic  expressions  of  sin(A)  and  ln(2f),  10  MSBs  are  optimum.  For 
real-valued  expressions  of  In(^)  and  tan(A),  7  MSBs  and  2  MSBs  are 
optimum,  respectively. 

Figure  4(b)  shows  the  number  of  distinct  nonzero  coefficients  in  the  four 
piecewise  arithmetic  expressions.  For  piecewise  integer-valued  arithmetic 
expressions  of  sin(A)  and  ln(2f ),  2  MSBs  and  8  MSBs  are  optimum  to 

produce  the  smallest  number  of  distinct  nonzero  coefficients.  For  real-valued 
expressions  of  ln(2f)  and  tan(A),  7  MSBs  and  3  MSBs  are  optimum, 
respectively. 
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(a)  Number  of  zero  coefficients 


Number  of  MSBs  for  uniform  segmentation 


(b)  Number  of  distinct  nonzero  coefficients 

FIGURE  4 

Relation  between  the  number  of  coefficients  in  piecewise  arithmetic  expressions  and  the  number 
of  MSBs. 


4.3  Architectures  for  Programmable  NFGs  Based  on  Piecewise 
Arithmetic  Expressions 

By  realizing  a  set  of  the  arithmetic  spectra  for  a  piecewise  arithmetic  expres¬ 
sion  with  a  memory  (called  arithmetic  coefficients  table),  we  obtain  the  NFG 
in  Figure  5.  The  MSBs  of  X  select  a  segment  (an  arithmetic  spectrum),  and 
then  an  arithmetic  expression  is  computed  using  the  LSBs  of  X.  This  NFG 
requires  2^—1  adders,  where  k  is  the  number  of  the  LSBs.  Thus,  it  is  more 
compact  and  faster  than  an  NFG  based  on  the  single  arithmetic  expression  in 
Figure  3,  in  which  2"  —  1  adders  are  required. 
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m 


FIGURE  5 

Programmable  NFG  based  on  the  piecewise  arithmetic  expression. 


Unfortunately,  the  number  of  adders  is  still  large,  and  this  design  is  inef¬ 
ficient.  Since  the  arithmetic  spectra  usually  have  many  zero  coefficients  as 
shown  in  Table  3,  the  arithmetic  coefficients  table  is  sparse,  and  many  unnec¬ 
essary  additions  are  performed.  To  perform  only  necessary  additions,  and 
to  reduce  the  number  of  adders,  we  propose  the  architecture  shown  in  Fig¬ 
ure  6.  Note  that,  for  readability  of  the  figures,  the  reset  and  enable  signals  for 
registers  and  the  done  signal  (to  denote  completion  of  the  computation)  are 
omitted. 

In  this  architecture,  only  the  nonzero  arithmetic  coefficients  are  stored 
in  a  table  for  each  segment.  By  reading  out  each  coefficient  sequentially,  it 


(a)  Overall  architecture  (b)  Generalized  product  term 


FIGURE  6 

Programmable  NFG  based  on  a  sequential  computation. 
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fiX) 


FIGURE  7 

Improved  architecture  for  programmable  NFG. 


computes  the  arithmetic  expression  using  an  accumulator.  Each  product  term 
is  computed  with  the  circuit  in  Figure  6(b)  using  a  don’t  care  bit  pattern.  In 
the  don’t  care  bit  pattern,  bits  corresponding  to  input  variables  that  do  not 
appear  in  a  product  term  are  set  to  1 .  For  example,  for  a  5-bit  function,  the  bit 
pattern  for  the  product  term  x^xq  is  101 10.  In  this  architecture,  the  evaluation 
time  of  an  arithmetic  expression  is  proportional  to  the  number  of  nonzero 
arithmetic  coefficients.  Thus,  a  numeric  function  that  has  many  zero  arith¬ 
metic  coefficients  can  be  computed  at  high  speed. 

To  further  reduce  the  number  of  arithmetic  coefficients  to  be  stored  in  a 
table,  we  omit  repeated  coefficients  in  a  table.  Figure  7  shows  an  improved 
architecture.  By  using  pointers  to  the  distinct  arithmetic  coefficients  instead 
of  directly  storing  the  coefficients,  we  can  significantly  reduce  the  bit  width 
of  the  table  if  the  number  of  distinct  coefficients  is  small. 

The  proposed  architectures  can  realize  both  integer- valued  and  real- valued 
arithmetic  expressions.  However,  real-valued  arithmetic  expressions  require 
longer  bit  length  for  each  coefficient  and  larger  number  of  distinct  nonzero 
coefficients,  and  thus,  they  are  unsuitable  for  hardware  implementation. 
Therefore,  in  the  following,  we  focus  only  on  integer-valued  arithmetic 
expressions.  And,  the  proposed  architectures  use  a  fixed  number  of  MSBs  to 
select  a  segment.  As  shown  in  the  previous  subsection,  there  is  an  optimum 
number  of  MSBs  to  minimize  the  number  of  distinct  nonzero  arithmetic  coef¬ 
ficients,  depending  on  numeric  functions.  However,  we  have  to  consider  not 
only  the  number  of  arithmetic  coefficients  but  also  the  size  of  the  multiplex¬ 
ers  with  the  number  of  additions  for  the  proposed  architectures.  Therefore,  in 
this  paper,  we  use  about  one-half  of  the  input  bits  for  uniform  segmentation 
to  balance  the  size  of  the  multiplexers  with  the  number  of  additions. 
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Functions 

fiX) 

No.  of  stored  coefficients 

Size  of  coefficients  table 

Total  size  of  tables 

No.  of  additions 

NFGl  NFG2  NFG3 

(all)  (nonzero)  (distinct) 

NFGl 

(bits) 

NFG2 

(bits) 

NFG3 

(bits) 

NFGl 

(bits) 

NFG2 

(bits) 

NFG3 

(bits) 

NFGl 

NFG2,3 

(average) 

V-ln(X) 

65,536 

40,218 

866 

1,179,648 

723,924 

15,588 

1,179,648 

1,085,886 

779,730 

255 

157.1 

-XMX) 

65,536 

37,228 

527 

983,040 

558,420 

7,905 

983,040 

893,472 

715,237 

255 

145.4 

65,536 

35,524 

445 

1,114,112 

603,908 

7,565 

1,114,112 

923,624 

646,997 

255 

138.8 

e-* 

65,536 

36,590 

587 

1,179,648 

658,620 

10,566 

1,179,648 

987,930 

705,776 

255 

142.9 

ln(X  +  1) 

65,536 

36,914 

402 

1,048,576 

590,624 

6,432 

1,048,576 

922,850 

670,884 

255 

144.2 

iog2(x  -r  1) 

65,536 

35,069 

451 

1,048,576 

561,104 

7,216 

1,048,576 

876,725 

638,458 

255 

137.0 

65,536 

37,702 

401 

1,114,112 

640,934 

6,817 

1,114,112 

980,252 

685,453 

255 

147.3 

VxTi 

65,536 

37,316 

323 

1,114,112 

634,372 

5,491 

1,114,112 

970,216 

677,179 

255 

145.8 

sin(X) 

65,536 

31,327 

391 

1,048,576 

501,232 

6,256 

1,048,576 

783,175 

570,142 

255 

122.4 

tan(X) 

65,536 

33,397 

548 

1,114,112 

567,749 

9,316 

1,114,112 

868,322 

643,859 

255 

130.5 

sin-‘(X) 

65,536 

32,059 

532 

1,114,112 

545,003 

9,044 

1,114,112 

833,534 

618,165 

255 

125.2 

tan-‘(X) 

65,536 

32,463 

401 

1,048,576 

519,408 

6,416 

1,048,576 

811,575 

590,750 

255 

126.8 

TABLE  4 

Table  sizes  and  the  number  of  additions  for  16-bit  NFGs. 


NFGl:  the  NFG  shown  in  Figure  5.  NFG2:  the  NFG  shown  in  Figure  6.  NFG3:  the  NFG  shown  in  Figure  7. 
No.  of  additions:  the  number  of  additions  needed  to  compute  each  arithmetic  expression. 

The  number  of  MSBs  for  uniform  segmentation  is  8.  The  domain  of  all  functions  is  0  <  X  <  1 . 


5  EXPERIMENTAL  RESULTS 

To  show  the  efficiency  of  the  proposed  NFGs,  we  compare  the  table  sizes 
and  the  numbers  of  additions  for  the  three  proposed  NFGs.  Table  4  shows 
the  experimental  results.  In  this  table,  the  column  “No.  of  additions”  denotes 
the  number  of  additions  needed  to  compute  an  arithmetic  expression  for  each 
segment.  Since,  in  the  NFGs  in  Figures  6  and  7,  the  numbers  of  product 
terms  in  arithmetic  expressions  for  different  segments  are  different,  the  aver¬ 
age  number  of  additions  for  each  expression  is  shown. 

Since  the  programmable  NFG  based  on  the  single  arithmetic  expression 
in  Figure  3  requires  2'®  =  65,  536  registers  and  2'®  —  1  =  65,  535  adders, 
the  proposed  NFGs,  based  on  the  piecewise  arithmetic  expressions,  require 
several  orders  of  magnitude  fewer  adders,  and  much  less  storage  size.  As 
shown  in  Table  4,  for  many  numeric  functions,  the  number  of  nonzero  arith¬ 
metic  coefficients  and  the  number  of  distinct  arithmetic  coefficients  are  small. 
Thus,  by  storing  only  these,  we  significantly  reduce  size  of  the  arithmetic 
coefficients  tables,  resulting  in  a  reduction  of  total  size  of  tables. 

6  IMPROVEMENT  TECHNIQUES  FOR  NFGS 
6.1  Piecewise  Polynomial  Approximation 

By  using  a  polynomial  approximation,  we  can  reduce  the  number  of  nonzero 
arithmetic  coefficients,  and  thus,  the  table  size  and  the  number  of  additions 
can  be  further  reduced.  This  is  based  on  Lemma  2. 
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We  approximate  a  given  numeric  function  using  a  piecewise  polynomial 
within  a  desired  error,  and  then  transform  the  polynomial  into  an  arithmetic 
expression  in  each  segment.  The  piecewise  arithmetic  expression  obtained  in 
this  way  is  realized  with  a  compact  NFG. 


Example  5.  Consider  a  piecewise  quadratic  polynomial  approximation  of 
a  16-bit  numeric  function  using  256  uniform  segments.  Then,  a  polynomial 
in  each  segment  has  8  bits.  Thus,  the  total  number  of  nonzero  arithmetic 
coefficients  is  at  most 


=  256  X  37  =  9,  472, 


and  the  number  of  additions  is  only  36. 


In  this  way,  by  using  piecewise  polynomial  approximation,  more  compact 
and  faster  programmable  NFGs  based  on  the  piecewise  arithmetic  expression 
can  be  produced. 


6.2  Parallel  Computation 

The  proposed  NFGs  based  on  a  sequential  computation  in  Figures  6  and  7 
produce  an  arithmetic  coefficient  one  by  one,  and  compute  each  product  term 
of  an  arithmetic  expression  sequentially  using  only  one  adder.  Thus,  they 
require  0{N)  computation  time,  where  N  is  the  number  of  product  terms. 
As  shown  in  Table  4,  they  require  about  120  to  160  additions  on  average, 
to  compute  a  numeric  function  value,  even  though  many  product  terms  are 
reduced  by  zero  coefficients. 

On  the  other  hand,  the  NFG  in  Figure  5  produces  all  arithmetic  coefficients 
simultaneously,  and  adds  all  product  terms  of  an  arithmetic  expression  at 
once  using  an  adder  tree.  Thus,  it  requires  G(logA)  computation  time.  In 
Table  4,  it  computes  a  function  value  in  delay  time  for  an  8-level  adder  tree 
with  255  adders.  Therefore,  the  NFG  in  Figure  5  is  faster  but  requires  many 
more  adders  than  the  proposed  NFGs.  Note  that  the  adders  have  different 
sizes.  However,  in  an  FPGA  or  ASIC,  this  is  not  a  problem  because  different 
size  adders  can  be  easily  accommodated. 

These  two  designs  are  extreme  cases.  By  changing  the  number  of  product 
terms  to  be  computed  in  parallel,  we  can  explore  the  design  space  taking  into 
account  a  trade-off  between  the  number  of  adders  and  the  computation  time, 
and  can  produce  an  optimum  NFG  depending  on  applications. 
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7  CONCLUSION  AND  COMMENTS 

This  paper  proposes  a  new  representation  of  numeric  functions  using  a  piece- 
wise  arithmetic  expression  and  new  architectures  for  programmable  NFGs 
based  on  the  piecewise  arithmetic  expression.  By  using  a  piecewise  arith¬ 
metic  expression,  we  can  increase  the  number  of  zero  arithmetic  coefficients 
significantly,  and  design  a  more  compact  NFG.  Experimental  results  show 
that  the  size  of  the  arithmetic  coefficients  table  in  an  NFG  can  be  reduced  to 
only  a  few  percent  of  the  table  size  needed  to  store  all  the  arithmetic  coeffi¬ 
cients.  By  using  the  proposed  NFGs,  we  can  realize  a  wide  range  of  numeric 
functions  with  a  single  architecture,  and  we  can  switch  the  functions  by  only 
changing  the  contents  of  tables. 

The  proposed  design  method  for  the  NFGs  can  explore  the  design  space 
taking  into  account  a  trade-off  between  the  number  of  adders  and  the  com¬ 
putation  time  by  changing  the  number  of  product  terms  to  be  computed  in 
parallel.  To  easily  find  an  optimum  NFG  for  each  application,  we  will  define 
a  figure  of  merit  (FoM)  for  NFGs  in  our  future  work. 
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